# Self-Distillation Training
Seerattention QwQ 32B AttnGates
Apache-2.0
Introducing an attention gating (AttnGates) weight adapter for the QwQ-32B model to accelerate long-context computation through dynamic block-level sparsity
Large Language Model
Transformers

S
SeerAttention
35
3
Splade Cocondenser Selfdistil
SPLADE model for passage retrieval, improving retrieval effectiveness through sparse latent document expansion and knowledge distillation techniques
Text Embedding
Transformers English

S
naver
16.11k
10
Trans Encoder Bi Simcse Roberta Large
An unsupervised sentence encoder based on RoBERTa-large, trained with self-distillation and mutual distillation techniques, suitable for sentence similarity calculation tasks.
Text Embedding
Transformers

T
cambridgeltl
17
0
Featured Recommended AI Models